101 research outputs found

    Stochastic Training of Neural Networks via Successive Convex Approximations

    Full text link
    This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of non-convex optimization, going under the general name of successive convex approximation (SCA) techniques. The basic idea is to iteratively replace the original (non-convex, highly dimensional) learning problem with a sequence of (strongly convex) approximations, which are both accurate and simple to optimize. Differently from similar ideas (e.g., quasi-Newton algorithms), the approximations can be constructed using only first-order information of the neural network function, in a stochastic fashion, while exploiting the overall structure of the learning problem for a faster convergence. We discuss several use cases, based on different choices for the loss function (e.g., squared loss and cross-entropy loss), and for the regularization of the NN's weights. We experiment on several medium-sized benchmark problems, and on a large-scale dataset involving simulated physical data. The results show how the algorithm outperforms state-of-the-art techniques, providing faster convergence to a better minimum. Additionally, we show how the algorithm can be easily parallelized over multiple computational units without hindering its performance. In particular, each computational unit can optimize a tailored surrogate function defined on a randomly assigned subset of the input variables, whose dimension can be selected depending entirely on the available computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and Learning System

    Learning from distributed data sources using random vector functional-link networks

    Get PDF
    One of the main characteristics in many real-world big data scenarios is their distributed nature. In a machine learning context, distributed data, together with the requirements of preserving privacy and scaling up to large networks, brings the challenge of designing fully decentralized training protocols. In this paper, we explore the problem of distributed learning when the features of every pattern are available throughout multiple agents (as is happening, for example, in a distributed database scenario). We propose an algorithm for a particular class of neural networks, known as Random Vector Functional-Link (RVFL), which is based on the Alternating Direction Method of Multipliers optimization algorithm. The proposed algorithm allows to learn an RVFL network from multiple distributed data sources, while restricting communication to the unique operation of computing a distributed average. Our experimental simulations show that the algorithm is able to achieve a generalization accuracy comparable to a fully centralized solution, while at the same time being extremely efficient

    Widely Linear Kernels for Complex-Valued Kernel Activation Functions

    Full text link
    Complex-valued neural networks (CVNNs) have been shown to be powerful nonlinear approximators when the input data can be properly modeled in the complex domain. One of the major challenges in scaling up CVNNs in practice is the design of complex activation functions. Recently, we proposed a novel framework for learning these activation functions neuron-wise in a data-dependent fashion, based on a cheap one-dimensional kernel expansion and the idea of kernel activation functions (KAFs). In this paper we argue that, despite its flexibility, this framework is still limited in the class of functions that can be modeled in the complex domain. We leverage the idea of widely linear complex kernels to extend the formulation, allowing for a richer expressiveness without an increase in the number of adaptable parameters. We test the resulting model on a set of complex-valued image classification benchmarks. Experimental results show that the resulting CVNNs can achieve higher accuracy while at the same time converging faster.Comment: Accepted at ICASSP 201

    Bidirectional deep-readout echo state networks

    Full text link
    We propose a deep architecture for the classification of multivariate time series. By means of a recurrent and untrained reservoir we generate a vectorial representation that embeds temporal relationships in the data. To improve the memorization capability, we implement a bidirectional reservoir, whose last state captures also past dependencies in the input. We apply dimensionality reduction to the final reservoir states to obtain compressed fixed size representations of the time series. These are subsequently fed into a deep feedforward network trained to perform the final classification. We test our architecture on benchmark datasets and on a real-world use-case of blood samples classification. Results show that our method performs better than a standard echo state network and, at the same time, achieves results comparable to a fully-trained recurrent network, but with a faster training

    Distributed Stochastic Nonconvex Optimization and Learning based on Successive Convex Approximation

    Full text link
    We study distributed stochastic nonconvex optimization in multi-agent networks. We introduce a novel algorithmic framework for the distributed minimization of the sum of the expected value of a smooth (possibly nonconvex) function (the agents' sum-utility) plus a convex (possibly nonsmooth) regularizer. The proposed method hinges on successive convex approximation (SCA) techniques, leveraging dynamic consensus as a mechanism to track the average gradient among the agents, and recursive averaging to recover the expected gradient of the sum-utility function. Almost sure convergence to (stationary) solutions of the nonconvex problem is established. Finally, the method is applied to distributed stochastic training of neural networks. Numerical results confirm the theoretical claims, and illustrate the advantages of the proposed method with respect to other methods available in the literature.Comment: Proceedings of 2019 Asilomar Conference on Signals, Systems, and Computer

    Pixle: a fast and effective black-box attack based on rearranging pixels

    Get PDF
    Recent research has found that neural networks are vulnerable to several types of adversarial attacks, where the input samples are modified in such a way that the model produces a wrong prediction that misclassifies the adversarial sample. In this paper we focus on black-box adversarial attacks, that can be performed without knowing the inner structure of the attacked model, nor the training procedure, and we propose a novel attack that is capable of correctly attacking a high percentage of samples by rearranging a small number of pixels within the attacked image. We demonstrate that our attack works on a large number of datasets and models, that it requires a small number of iterations, and that the distance between the original sample and the adversarial one is negligible to the human eye

    Continual Learning with Invertible Generative Models

    Get PDF
    Catastrophic forgetting (CF) happens whenever a neural network overwrites past knowledge while being trained on new tasks. Common techniques to handle CF include regularization of the weights (using, e.g., their importance on past tasks), and rehearsal strategies, where the network is constantly re-trained on past data. Generative models have also been applied for the latter, in order to have endless sources of data. In this paper, we propose a novel method that combines the strengths of regularization and generative-based rehearsal approaches. Our generative model consists of a normalizing flow (NF), a probabilistic and invertible neural network, trained on the internal embeddings of the network. By keeping a single NF throughout the training process, we show that our memory overhead remains constant. In addition, exploiting the invertibility of the NF, we propose a simple approach to regularize the network's embeddings with respect to past tasks. We show that our method performs favorably with respect to state-of-the-art approaches in the literature, with bounded computational power and memory overheads.Comment: arXiv admin note: substantial text overlap with arXiv:2007.0244
    • …
    corecore